fix: runpod adapter #3641

justinwlin · 2025-10-01T21:43:14Z

What does this PR do?

PR fixes Runpod Adapter
#3517

Test Plan

# RunPod Provider Quick Start

## Prerequisites
- Python 3.10+
- Git
- RunPod API token

## Setup for Development

```bash
# 1. Clone and enter the repository
cd (into the repo)

# 2. Create and activate virtual environment
python3 -m venv .venv
source .venv/bin/activate

# 3. Remove any existing llama-stack installation
pip uninstall llama-stack llama-stack-client -y

# 4. Install llama-stack in development mode
pip install -e .

# 5. Build using local development code
(Found this through the Discord)
LLAMA_STACK_DIR=. llama stack build

# When prompted during build:
# - Name: runpod-dev
# - Image type: venv
# - Inference provider: remote::runpod
# - Safety provider: "llama-guard" 
# - Other providers: first defaults

Configure the Stack

The RunPod adapter automatically discovers models from your endpoint via the /v1/models API.
No manual model configuration is required - just set your environment variables.

Run the Server

Important: Use the Build-Created Virtual Environment

# Exit the development venv if you're in it
deactivate

# Activate the build-created venv (NOT .venv)
cd (lama-stack folder github repo)
source llamastack-runpod-dev/bin/activate

For Qwen3-32B-AWQ Public Endpoint (Recommended)

# Set environment variables
export RUNPOD_URL="https://api.runpod.ai/v2/qwen3-32b-awq/openai/v1"
export RUNPOD_API_TOKEN="your_runpod_api_key"

# Start server
llama stack run ~/.llama/distributions/llamastack-runpod-dev/llamastack-runpod-dev-run.yaml

Quick Test

1. List Available Models (Dynamic Discovery)

First, check which models are available on your RunPod endpoint:

curl -X GET \
  -H "Content-Type: application/json" \
  "http://localhost:8321/v1/models"

Example Response:

{
  "data": [
    {
      "identifier": "qwen3-32b-awq",
      "provider_resource_id": "Qwen/Qwen3-32B-AWQ",
      "provider_id": "runpod",
      "type": "model",
      "metadata": {},
      "model_type": "llm"
    }
  ]
}

Note: Use the identifier value from the response above in your requests below.

2. Chat Completion (Non-streaming)

Replace qwen3-32b-awq with your model identifier from step 1:

curl -X POST http://localhost:8321/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3-32b-awq",
    "messages": [{"role": "user", "content": "Hello, count to 3"}],
    "stream": false
  }'

3. Chat Completion (Streaming)

curl -X POST http://localhost:8321/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3-32b-awq",
    "messages": [{"role": "user", "content": "Count to 5"}],
    "stream": true
  }'

Clean streaming output:

curl -N -X POST http://localhost:8321/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "qwen3-32b-awq", "messages": [{"role": "user", "content": "Count to 5"}], "stream": true}' \
  2>/dev/null | while read -r line; do
    echo "$line" | grep "^data: " | sed 's/^data: //' | jq -r '.choices[0].delta.content // empty' 2>/dev/null
  done

Expected Output:

meta-cla · 2025-10-01T21:43:20Z

Hi @justinwlin!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

meta-cla · 2025-10-01T21:46:35Z

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

mattf

please run pre-commit and let it fix the formatting / import issues.

also, address inline comments.

llama_stack/providers/remote/inference/runpod/runpod.py

justinwlin · 2025-10-06T17:49:59Z

Hi! I plan to reopen this PR later, just so I don't spam this PR thread as I push commits :) Will reopen when ready for another review.

@mattf

# What does this PR do? Sorry to @mattf I thought I could close the other PR and reopen it.. But I didn't have the option to reopen it now. I just didn't want it to keep notifying maintainers if I would make other commits for testing. Continuation of: #3641 PR fixes Runpod Adapter #3517 ## What I fixed from before: Continuation of: #3641 1. Made it all OpenAI 2. Fixed the class up since the OpenAIMixin had a couple changes with the pydantic base model stuff. 3. Test to make sure that we could dynamically find models and use the resulting identifier to make requests ```bash curl -X GET \ -H "Content-Type: application/json" \ "http://localhost:8321/v1/models" ``` ## Test Plan  ``` # RunPod Provider Quick Start ## Prerequisites - Python 3.10+ - Git - RunPod API token ## Setup for Development ```bash # 1. Clone and enter the repository cd (into the repo) # 2. Create and activate virtual environment python3 -m venv .venv source .venv/bin/activate # 3. Remove any existing llama-stack installation pip uninstall llama-stack llama-stack-client -y # 4. Install llama-stack in development mode pip install -e . # 5. Build using local development code (Found this through the Discord) LLAMA_STACK_DIR=. llama stack build # When prompted during build: # - Name: runpod-dev # - Image type: venv # - Inference provider: remote::runpod # - Safety provider: "llama-guard" # - Other providers: first defaults ``` ## Configure the Stack The RunPod adapter automatically discovers models from your endpoint via the `/v1/models` API. No manual model configuration is required - just set your environment variables. ## Run the Server ### Important: Use the Build-Created Virtual Environment ```bash # Exit the development venv if you're in it deactivate # Activate the build-created venv (NOT .venv) cd (lama-stack folder github repo) source llamastack-runpod-dev/bin/activate ``` ### For Qwen3-32B-AWQ Public Endpoint (Recommended) ```bash # Set environment variables export RUNPOD_URL="https://api.runpod.ai/v2/qwen3-32b-awq/openai/v1" export RUNPOD_API_TOKEN="your_runpod_api_key" # Start server llama stack run ~/.llama/distributions/llamastack-runpod-dev/llamastack-runpod-dev-run.yaml ``` ## Quick Test ### 1. List Available Models (Dynamic Discovery) First, check which models are available on your RunPod endpoint: ```bash curl -X GET \ -H "Content-Type: application/json" \ "http://localhost:8321/v1/models" ``` **Example Response:** ```json { "data": [ { "identifier": "qwen3-32b-awq", "provider_resource_id": "Qwen/Qwen3-32B-AWQ", "provider_id": "runpod", "type": "model", "metadata": {}, "model_type": "llm" } ] } ``` **Note:** Use the `identifier` value from the response above in your requests below. ### 2. Chat Completion (Non-streaming) Replace `qwen3-32b-awq` with your model identifier from step 1: ```bash curl -X POST http://localhost:8321/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "qwen3-32b-awq", "messages": [{"role": "user", "content": "Hello, count to 3"}], "stream": false }' ``` ### 3. Chat Completion (Streaming) ```bash curl -X POST http://localhost:8321/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "qwen3-32b-awq", "messages": [{"role": "user", "content": "Count to 5"}], "stream": true }' ``` **Clean streaming output:** ```bash curl -N -X POST http://localhost:8321/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{"model": "qwen3-32b-awq", "messages": [{"role": "user", "content": "Count to 5"}], "stream": true}' \ 2>/dev/null | while read -r line; do echo "$line" | grep "^data: " | sed 's/^data: //' | jq -r '.choices[0].delta.content // empty' 2>/dev/null done ``` **Expected Output:** ``` 1 2 3 4 5 ```

justinwlin added 2 commits September 30, 2025 15:20

Update runpod.py

064602b

Stable update

3236f82

justinwlin requested review from ashwinb, yanxi0830, hardikjshah, raghotham, ehhuang, terrytangyuan, leseb, bbrowning, reluctantfuturist, mattf and slekkala1 as code owners October 1, 2025 21:43

justinwlin changed the title ~~Runpod adapter fix~~ fix: runpod adapter Oct 1, 2025

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 1, 2025

Merge remote-tracking branch 'upstream/main' into runpod-adapter-fix

813ff44

mattf requested changes Oct 2, 2025

View reviewed changes

Remove unnecessary code

11b75a6

justinwlin requested a review from franciscojavierarceo as a code owner October 6, 2025 16:32

justinwlin closed this Oct 6, 2025

mattf mentioned this pull request Oct 6, 2025

Standardize Inference Providers to Use OpenAIMixin #3387

Open

justinwlin mentioned this pull request Oct 6, 2025

feat: enable Runpod inference adapter #3707

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: runpod adapter #3641

fix: runpod adapter #3641

Uh oh!

justinwlin commented Oct 1, 2025 •

edited

Loading

Uh oh!

meta-cla bot commented Oct 1, 2025

Uh oh!

meta-cla bot commented Oct 1, 2025

Uh oh!

mattf left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

justinwlin commented Oct 6, 2025 •

edited

Loading

Uh oh!

Uh oh!

fix: runpod adapter #3641

fix: runpod adapter #3641

Uh oh!

Conversation

justinwlin commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Test Plan

Configure the Stack

Run the Server

Important: Use the Build-Created Virtual Environment

For Qwen3-32B-AWQ Public Endpoint (Recommended)

Quick Test

1. List Available Models (Dynamic Discovery)

2. Chat Completion (Non-streaming)

3. Chat Completion (Streaming)

Uh oh!

meta-cla bot commented Oct 1, 2025

Action Required

Process

Uh oh!

meta-cla bot commented Oct 1, 2025

Uh oh!

mattf left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

justinwlin commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

justinwlin commented Oct 1, 2025 •

edited

Loading

justinwlin commented Oct 6, 2025 •

edited

Loading